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ABSTRACT 

Ribosome biogenesis is a tightly regulated, multi- 
stepped process. The assembly of ribosomal 
subunits is a central step of the complex biogenesis 
process, involving nearly 30 protein factors in vivo in 
bacteria. Although the assembly process has been 
extensively studied in vitro for over 40 years, very 
limited information is known for the in vivo 
process and specific roles of assembly factors. 
Such an example is ribosome maturation factor 
M (RimM), a factor involved in the late-stage 
assembly of the 30S subunit. Here, we combined 
quantitative mass spectrometry and cryo-electron 
microscopy to characterize the in vivo 30S 
assembly intermediates isolated from mutant 
Escherichia coli strains with genes for assembly 
factors deleted. Our compositional and structural 
data show that the assembly of the 3-domain of 
the 30S subunit is severely delayed in these inter- 
mediates, featured with highly underrepresented 
3 -domain proteins and large conformational differ- 
ence compared with the mature 30S subunit. Further 
analysis indicates that RimM functions not only to 
promote the assembly of a few 3 -domain proteins 
but also to stabilize the rRNA tertiary structure. 
More importantly, this study reveals intriguing 
similarities and dissimilarities between the in vitro 
and the in vivo assembly pathways, suggesting 
that they are in general similar but with subtle 
differences. 



INTRODUCTION 

Ribosome biogenesis is a tightly regulated multi-stepped 
process, assisted by a wide variety of protein factors, such 
as transcription factors, endoribonucleases, rRNA 
helicases and chaperones, rRNA and ribosomal protein 
modification enzymes and assembly factors (1). As to 
the 30S subunit, early in vitro reconstitution experiments 
(2-6) have demonstrated that active 30S subunits could be 
formed from purified ribosomal proteins and 16S rRNA 
in the absence of other cellular components. The in vitro 
assembly occurs very slowly and requires non- 
physiological conditions, such as high Mg 2+ concentra- 
tion, high ion strength and heat shock. In contrast, the 
assembly of the 30S subunit in vivo starts with rRNA 
primary transcripts (7) and occurs co-transcriptionally 
(8) in a much more efficient way, underscoring the essen- 
tial contribution of assembly factors. In recent years, 
application of new techniques, such as pulse-chase moni- 
tored by quantitative mass spectrometry (PC/QMS) (9), 
time-resolved X-ray footprinting (10) and time-resolved 
electron microscopy (11), has brought our understanding 
of the in vitro assembly process to a new level, providing a 
large amount of valuable kinetic and structural informa- 
tion. Together with earlier work [reviewed in (12)], these 
data have established that the in vitro 30S subunit 
assembly starts from multiple sites on the 16S rRNA 
(10), following parallel pathways (9-11) and the free 
energy of the assembly can be represented by a complex 
landscape (9). More importantly, kinetic data revealed 
that for several subsets of 3'-domain proteins, the thermo- 
dynamic interdependence does not align well with 
measured kinetic cooperativity (11,13), and at these loca- 
tions, the in vitro assembly often encounters kinetic traps, 
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suggesting that assembly factors might be involved in sub- 
verting kinetic traps in the assembly landscape (9,11,13). 

Over the past two decades, accumulating experimental 
data, mainly through genetic approaches, has implicated a 
number of factors, including RbfA, RsgA, KsgA, Era, 
ribosome maturation factor M (RimM), RimP, RimJ 
[(reviewed in (12)] and YqeH (14,15), in the maturation of 
the 30S subunit in bacteria. However, the specific molecular 
roles of most of these factors remain unclear. Among these 
factors, RimM was first identified as a factor required for a 
fast growth in rich medium (16). The gene-encoding RimM 
(yfjA) in Escherichia coli is co-localized to the trmD operon 
(17) with genes for ribosomal proteins S16 and L19, and a 
tRNA methyltransferase (TrmD), a hint that RimM might 
be directly involved in ribosome-related function. Indeed, 
deletion of RimM confers a slow growth phenotype (18), 
with accumulation of 16S rRNA precursors and free 30S 
subunits (19) as well as reduced level of polysomes (20). 
RimM associates with free 30S subunit in vivo (18,20) and 
also binds to S19 in vitro (20,21). Moreover, suppressor mu- 
tations to the ArimM mutant were found on S13 (18) and 
suppressor mutations to a n>77M-Y106AY107A mutant were 
found on S19, helices 31 and 33b of the 16S rRNA (20). 

In this study, we characterize the immature 30S subunits 
purified from an E. coli ArimM strain biochemically and 
structurally. Our data indicate that the immature 30S 
subunits are a collection of assembly intermediates, with 
the 3'-head domain proteins severely underrepresented, 
such as S10, S14, S13 and S19. Moreover, protein com- 
position analysis of another category of immature 30S 
subunits from a ArbfAArsgA strain shows a different 
spectrum, with much enhanced levels for these proteins, 
suggesting that RimM promotes the assembly of these 
slow binding proteins in vivo. Structural analysis shows 
that these ArimM intermediates also differ largely in 
rRNA conformation, particularly the rotational position 
of the 3'-head domain relative to the body domain. An 
incubation of recombinant RimM with the immature 30S 
subunits significantly reduces the flexibility of the head 
domain. More importantly, our data also suggest that 
the in vivo assembly process occurs along multiple 
pathways in a certain degree as well, and the rRNA mat- 
uration is tightly coupled with ribosomal protein binding. 
The functional depiction of RimM thus illustrates that 
there are possible checkpoints along the in vivo assembly 
pathways where maturation factors come into play to 
direct the process to more efficient branches. 

MATERIALS AND METHODS 

Escherichia coli strains 

We used E. coli A19 (Hfr, ma-19, gdhA2, his-95, relAl, 
spoTl, metBl) (22) as the source of the 30S subunit. 
A19An'mM is an A19 derivative in which the rimM gene 
is replaced by a short peptide gene containing an FRT 
sequence, constructed as follows. The kanamycin-resistant 
marker of a rimM disruptant from Keio collection (23), in 
which the rimM gene has been substituted by an 
FRT-flanked kanamycin-resistant cassette, was transduced 
into A19 using phage Plvir to produce an intermediate 



strain. Then, the kanamycin-resistant cassette was 
removed from the intermediate strain using an FLP expres- 
sion plasmid pCP20 (24) to produce the A19 ArimM strain. 
A\9 ArbfAArsgA is an A19 derivative in which both of the 
rbfA and rsgA genes are replaced by a short peptide gene 
containing an FRT sequence, constructed by transducing 
rbf A:\FRT -kan-FRI into A19 and removing kan using 
pCP20 and then transducing rsg A:\FRT -kan-FRT into 
the resulted strain and removing kan using pCP20. 
Sources of rbfA::FRT-kan-FRT and rsgA::FRT-kan-FRT 
are intermediate strains produced during the construction 
of W31 \0ArbfA (25) and the c^g^-disrupted strain of Keio 
collection (23), respectively. Both the A\9ArimM and 
A\9ArbfA ArsgA strains were confirmed with polymerase 
chain reaction (PCR). 

Spot assay and ribosome profile 

A19, Al9ArimM and A19 ArbfAArsgA strains were 
grown in liquid LB at 37°C to OD 0.8 and diluted to a 
series of concentrations, 10°, 10" 1 , 10" 2 , 10" 3 , 10" 4 and 
10~ 5 . Three microliters of each dilution was dropped to a 
LB plate and incubated at 37°C overnight. The cell 
extracts from the A19, A\9 ArimM and A19 ArbfAArsgA 
strains were loaded onto a 10^10% sucrose gradient con- 
taining lOmM Mg(OAc) 2 and centrifuged for 3.5 h at 
39 000rpm in a SW41 rotor (Beckman Coulter). The gra- 
dients were analyzed with A254 absorbance using a 
Teledyne ISCO fractionation system. 

Immature and mature 30S subunit purification 

Escherichia coli cells (A19, A\9ArimM and 
A19 ArbfAArsgA strains) grown in LB medium were har- 
vested, lysed and clarified in opening buffer [20 mM Tris- 
HC1 (pH = 7.5), 150mM NH 4 C1, 10 mM Mg(OAc) 2 and 
0.5 mM ethylenediaminetetraacetic acid (EDTA)]. The 
lysate was loaded onto the top of 5 ml sucrose cushion 
[20mM Tris-HCl (pH = 7.5),150mM NH 4 C1, lOmM 
Mg(OAc) 2 , 0.5 mM EDTA and 1.1 M sucrose] and centri- 
fuged for 18 h at 28 000rpm in a 70Ti rotor (Beckman 
Coulter). The resulting pellets were resuspended in binding 
buffer and centrifuged through a 10^10% sucrose gradient 
with lOmM Mg(OAc) 2 for 7h at 30000rpm in a SW32 
rotor (Beckman Coulter). Fractions containing the 
immature 30S and 70S peaks were pooled separately and 
concentrated with buffer changed to binding buffer for the 
30S fractions and to separation buffer [20 mM Tris-HCl 
(pH = 7.5), 150mM NH 4 C1 and 2mM Mg(OAc) 2 ] for the 
70S fractions. The 70S fractions were further centrifuged 
through a 10-40% sucrose gradient with 2mM Mg(OAc) 2 
to get the mature 30S and 50S subunits. 

RimM preparation, rRNA extraction and identification of 
the 3' and 5'ends of the 17S rRNA 

Full details are available in the Supplementary Data. 
Pelleting assay 

Mature or immature 30S subunits (2.5pmol) were 
incubated with 30-fold excess of RimM for 15min at 
37°C in binding buffer. The mixture was then layered 
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onto a 150 ul sucrose cushion and centrifuged at 
96409 rpm for 4h in a TLA-120.1 rotor (Beckman 
Coulter). The pellets and the supernatants were separated 
and 1/2 of total pellets and 1/20 of supernatants were 
resolved by 12% sodium dodecyl sulfate-polyacrylamide 
gel electrophoresis (SDS-PAGE). 

Quantitative mass spectrometry 

For quantitation of targeted protein, samples with same 
A260 absorption value were separated on ID 
Tricine-SDS-PAGE. Among all ribosomal proteins, SI 
was not included in the QMS analysis, because it dissoci- 
ates readily from the 30S subunits during centrifugation- 
based purification. The gel bands corresponding to the 
targeted protein were excised from the gel, reduced with 
10 mM of Dithiothreitol (DTT) and alkylated with 55 mM 
iodoacetamide. Then, in-gel digestion was performed with 
the sequence grade modified trypsin (Promega) in 50 mM 
ammonium bicarbonate at 37°C overnight. The peptides 
were extracted twice with 1% trifluoroacetic acid in 50% 
acetonitrile aqueous solution for 30min. The extractions 
were then centrifuged in a speedvac to reduce the volume. 
Peptides from different samples were labeled with tandem 
mass tags (TMT) reagents (Thermo, Pierce 
Biotechnology) according to the manufacturer's instruc- 
tion (TMT 127, 129 and 130 for the samples from the 
A19 mature 30S, A19 ArimM and Al9ArbfAArsgA 
samples, respectively). Briefly, the TMT label reagents 
were dissolved by anhydrous acetonitrile and carefully 
added to each digestion products. The reaction was per- 
formed for 1 h at room temperature, and hydroxylamine 
was used to quench the reaction. The TMT-labeled 
peptides were desalted using the stage tips. 

For LC-MS/MS analysis, the TMT-labeled peptides 
were separated by a 65-min gradient elution at a flow 
rate of 0.250 ul/min with an EASY-nLCII™ integrated 
nano-HPLC system (Proxeon), which is directly interfaced 
with a Thermo LTQ-Orbitrap mass spectrometer. The 
analytical column was a home-made fused silica capillary 
column (75 um ID, 150 mm length; Upchurch) packed 
with C-18 resin (300 A, 5 pm; Varian). Mobile phase A 
consisted of 0. 1 % formic acid and mobile phase B con- 
sisted of 100% acetonitrile and 0.1% formic acid. The 
LTQ-Orbitrap mass spectrometer was operated in the 
data-dependent acquisition mode using the Xcalibur 
2.0.7 software and there was a single full-scan mass 
spectrum in the Orbitrap (400-1800 m/z, 30000 reso- 
lution) followed by three MS/MS scans in the quadrupole 
collision cell using the higher energy collision dissociation. 

The MS/MS spectra from each LC-MS/MS run were 
searched against the selected database using an in-house 
Mascot or Proteome Discovery searching algorithm. 
Peptides that have XCorr/Charge scores >2.75 for 2+ 
and 3.0 for 3+ were used for protein identification and 
MS/MS spectra for all matched peptides were manually 
interpreted and confirmed. The QMS experiments were 
repeated for three times and similar results were 
obtained. For TMT quantification of a specific protein, 
ratios of 129:127 and 130:127 for each of the ribosomal 
proteins were examined by Grubbs' test to remove 



outliers. Ratios of two or more tryptic peptides from the 
same protein were used to calculate the means and the 
standard deviations (Supplementary Table SI). 

Cryo sample preparation and cryo-electron microscopy 

Cryo-grids for the immature 30S subunits were prepared 
as previously described (26). The grids were examined in 
an FEI Tecnai F20 microscope operated at 200 kV, and 
images were recorded at a nominal magnification of 
80 000 x on a Gatan UltraScan 4000 CCD camera, 
under low-dose conditions (~20e-/A 2 ). The complex of 
the immature 30S subunit bound with RimM was 
formed by an incubation of a 40-fold excess of RimM 
with the immature 30S subunits at 37°C for 15min. The 
grids of the 30S complex were examined in an FEI Titan 
Krios cryo-TEM operated at 300 kV, and images were 
collected at a nominal magnification of 59 000 x on an 
FEI Eagle 4k x 4k CCD camera, under low-dose condi- 
tion. Data collection was done with AutoEMation 
software package (27). 

Image processing 

All the micrographs were decimated by a factor of 2. 
Particle picking was performed using the SPIDER 
package (28) with a method based on a locally normalized 
cross-correlation function (29). The resulting particles 
(125 x 125 in window size, 2.76 and 3.0 A in effective 
pixel size, for the 30S and the 30S complex samples, re- 
spectively) were manually verified using a method based 
on correspondence analysis (30). To ensure the perform- 
ance of the 2D and 3D analysis, particles were further 
subjected to another round of manual screen, which 
finally rendered 164 368 and 94 535 particles for the 30S 
and 30S complex, respectively. The parameters of the 
contrast transfer function (CTF) were estimated using 
SPIDER at the micrograph level. Particles were then 
CTF corrected using the phase-flipping method (31). 

2D image classification was performed using a 
maximum-likelihood approach (32) with the XMIPP 
software package (33). Particles from both samples were 
classified into 100 groups in 100 iterations, and the per- 
formance of the classification was monitored by 
log-likelihood function. To facilitate further comparison, 
class average images were subjected to a multi-reference 
alignment to 83 2D projections generated from a cryo-EM 
map of the mature 30S subunit (26), at an angular interval 
of 15° (Supplementary Figure S2). 

3D classification was performed using a 3D maximum- 
likelihood approach with XMIPP (34). The initial model 
was generated by low-pass filtering (60 A) of a cryo-EM 
map of the mature 30S subunit (26). Particles from the 
both samples were classified into five groups in 50 iter- 
ations, at an angular sampling of 10°. Refinements of 
the class structures were performed with SPIDER, follow- 
ing the standard reference projection matching procedures 
(31), with a gradual decrease of the angular step from 15° 
to 1°. Amplitude correction to the density maps was per- 
formed as previously described (26,35). The final reso- 
lutions of the refined density maps were estimated with a 
soft Gaussian mask approach (36,37) using 0.5 cutoff 
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criterion of the Fourier Shell Correlation (Supplementary 
Table S2). 

Atomic model and temperature map building 

The head and body domains of a 30S subunit crystal 
structure [PDB ID: 30FA, (38)] were docked into the 
cryo-EM maps as rigid bodies first using Chimera (39), 
followed by a flexible fitting method based on molecular 
dynamics simulation (40) in vacuo for 1 000 000 steps with 
a 0.5-kcal mol -1 scaling factor using NAMD (41). To 
avoid overfilling, ribosomal protein S2 and S3 were 
removed from the initial model before flexible fitting due 
to their low occupancies. For class No. 5 of the immature 
30S subunits (Supplementary Table S2), all proteins in the 
head domain were removed and only the rRNA structure 
was refined. For better comparison, after fitting, S2 and S3 
proteins were added back to the fitted structure using their 
contacting rRNA helices as reference. The 10 models were 
aligned using the 30S body domain as reference and 10 
temperature maps were constructed in PyMOL (42) by 
calculating the deviation of the 16S rRNA in the fitted 
models from the mature 30S structure. The scripts used 
for root-mean-square deviation (RMSD) calculation and 
temperature map visualization were downloaded from 
http://pldserverl.biochem.queensu.ca/~rlc/work/pymol/. 
Chimera and PyMOL were used for graphic visualization. 

RESULTS 

Construction of a series of E. coli A19 strains 

RNase I is the major non-specific endoribonuclease 
localized in periplasm and often found to be in 30S 
subunit fractions in cell extracts (43). To avoid undesired 
degradation of the rRNA precursors in the immature 30S 
subunits during the sample preparation, we chose the 
RNase I defective A19 strain (22) as the source of the 
30S subunits. In this genetic background, we further con- 
structed strains with the rimM gene deleted and with both 
rsgA and rbfA genes deleted. The two resulting strains 
grow poorly on LB medium and show an accumulation 
of free 30S subunits (Figure 1). Interestingly, both the cell 
growth test (Figure 1A) and the ribosome profile analysis 
(Figure IB) show that the deletion of rimM is more dele- 
terious. As a result, there is an intermediate peak between 
the 30S and 50S peaks, probably representing immature 
50S precursors caused by globally decreased protein pro- 
duction in the ArimM strain (Figure IB). 

Compositional characterization of the immature 30S 
subunits from the A19 ArimM and ArbfAArsgA strains 

RNA gel analysis shows that the rRNAs in the 30S frac- 
tions from the ArimM strain and the ArbfAArsgA strain 
are 16S rRNA precursors (Figure 2A), indicating that 
these free 30S subunits are indeed immature 30S particles. 
Identification of the two sets of 16S rRNA precursors by a 
previously established 5'3'-rapid amplification of com- 
plementary DNA ends (RACE) technique (44) reveals 
that a majority of these precursors are unprocessed at 
both the 5'- and 3'-ends (Supplementary Figure SI). The 




Figure 1. Phenotypes of the ArimM and ArbfAArsgA strains. (A) Spot 
assay showing that both the ArimM (A) and ArbfAArsgA (AA) strains 
grow slowly, compared with the wild-type strain (WT). (B) Ribosome 
profile analysis of the A19, ArimM and ArbfAArsgA strains. The 
profile curves of the WT. A and AA strains are colored in black, 
green and red, respectively. Deletion of RimM or a combination of 
RbfA and RsgA causes an accumulation of immature 30S subunits. 
Both experiments indicate that deletion of RimM is more deleterious. 

protein gel analysis shows that some ribosomal proteins, 
e.g. S2 and S3, are underrepresented in the ArimM sample 
(Figure 2B). The compositional heterogeneity suggests 
that the immature 30S subunits from the ArimM strain 
are a collection of in vivo assembly intermediates that are 
different in protein composition. 

To determine the protein levels, similar to a previously 
established quantification method (45), we used a QMS 
technique based on TMT labeling (46). The QMS data 
reveal that the levels of S21, S10, S14, S13, S19, S3, S2 
and S5 are dramatically reduced in the ArimM sample, 
<50% of those in the mature 30S subunits (Figure 2C 
and Supplementary Table SI). Most of them are second- 
ary and tertiary binding proteins from the 3'-head domain 
of the 30S subunit, except that S21 and S5 are tertiary 
binder from the central domain and the 5'-domain, re- 
spectively. S21 is known to easily dissociate in solution 
(47) and is therefore not included for further analysis. 
Thus, these data clearly demonstrate that the deletion 
of RimM causes a severe delay in the assembly of the 
3'-domain of the 30S subunit in vivo (Figure 2C and 
Supplementary Figure S3). Among these 3'-domain 
proteins, S7 has the highest occupancy (81%) in the 
ArimM sample, which is in accordance with the in vitro 
assembly map that S7 is a primary binder and directs the 
binding of all the rest 3'-domain proteins (48). 

In contrast, the protein composition of the immature 30S 
subunits from the ArbfAArsgA strain shows intriguing dif- 
ference and similarity (Supplementary Figure S3). The 
most underrepresented protein in the ArbfAArsgA 
sample is still S21 (30%), followed by S7, S2, S10, Sll 
and S19 (49-63%) (Figure 2C and Supplementary Table 
SI), clearly showing a different pattern. Although many 3'- 
domain proteins, such as S10, S13, S14, S19 and S3, are also 
underrepresented, their levels are significantly higher than 
those in the ArimM sample (Figure 2D and Supplementary 
Table SI). In fact, the ArbfAArsgA sample has a higher 
level for almost all the proteins, compared with the 
ArimM sample (Figure 2C and Supplementary Figure 
S3). For example, S10, S13 and S14 have an over 2-fold 
increase and S21, S19 and S3 have a moderate increase, 
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Figure 2. Compositional characterization of the immature 30S subunits. Composition of mature and immature 30S subunits from the A19 ArimM 

(A) and ArbfAArsgA (AA) strains was analyzed in both the RNA and protein levels. (A) RNA gel analysis of the rRNAs in the A and AA samples. 

(B) Tricine-SDS-PAGE analysis of the protein composition of the immature 30S subunits. (C) QMS analysis of the protein composition in the A and 
the AA samples. Error bars show standard deviations. The difference in protein ratio between the A and AA samples was subjected to a one-tailed 
Mest, which reports a significant difference for S10, S14, S13, S3, S12, S5, S4, S7 (P<0.01), S19 and S6 (,P<0.05). (D) The relative protein ratios of 
the AA sample to the A sample (AA/A) are plotted against the ratios of the A sample to the mature one (A/mature). (E) Atomic structure of the 
mature 30S subunit (38) viewed from the inter-subunit and solvent sides, with proteins in the top-left part of (D) colored in green. 



from 1 .5- to 2-folds. Interestingly, two primary proteins, S7 
and S4, display significantly lower levels in the 
ArbfAArsgA sample than in the ArimM sample (Figure 
2C and D). Taking together, an evident pattern is that 
the immature 30S subunits from the ArbfAArsgA strain 
have significantly higher occupancies for all the secondary 
and tertiary binding proteins in the 3'-domain (Figure 2E), 
suggesting that their 3'-head domains are indeed further 
maturated with more proteins incorporated. 

This immediately suggests that a role of RimM in vivo is 
to promote the binding of 3'-domain proteins, since the 
immature 30S subunits from the ArbfAArsgA strain likely 
resemble a stage downstream the RimM action. In agree- 
ment with this conclusion, the in vitro kinetic data show 
that RimM accelerates the binding of some head domain 
proteins, S19, S10 and S3 (49). 

Structural characterization of the immature 30S subunits 
from the ArimM strain 

To explore the structural heterogeneity of the immature 
30S subunits, we applied the cryo-EM single-particle 



method to our sample. First, a reference-free 2D image 
classification technique based on maximum-likelihood op- 
timization (32) was employed to estimate the level of struc- 
tural variation in the cryo-EM particles. The 2D analysis 
reveals that a large number of the class average images 
show smeared densities on the head domain of the 30S 
subunit. In contrast, densities in these average images cor- 
responding to the body domain are nicely resolved, and 
the features of the body domain could be easily identified 
(Supplementary Figure S2). This suggests that the 
immature 30S subunits are truly composed of multiple 
assembly intermediates, with a highly flexible head 
domain and a rather rigid body domain. 

Next, a multi-structure refinement method (34) was 
used to investigate possible metastable structural inter- 
mediates at the 3D level. As a result, the particles were 
grouped into five classes, and as expected, the five 
cryo-EM maps (at 12-14 A resolution) display dramatic 
conformational differences (Figure 3). However, similar 
to a previous cryo-EM study on the immature 30S 
subunits from a ArsgA strain (50), we did not find 
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Figure 3. Overview of the five cryo-EM structures of the immature 30S subunits from the ArimM strain. The five density maps (A-E or F-J, 
respectively) are displayed in transparent surface representation, superimposed with flexibly fitted crystal structures in cartoon representation. For 
each map, both the inter-subunit view (A-E) and the solvent view (F-J) are displayed. The 16S rRNA, S2, S3 and the rest 30S subunit proteins are 
painted in blue, green, red and purple, respectively. Deviations of the 16S rRNA backbones in the fitted model from that of the mature 30S subunit 
are colored as indicated by the scale to form the temperature maps (K-O). 



significant densities that could be attributed to the unpro- 
cessed ends of the 17S rRNA. 

To facilitate the quantitative comparison of the struc- 
tural data, we built pseudo-atomic models for the five 
cryo-EM maps using a flexible fitting technique (40). 
Based on these models, five temperature maps for the 
16S rRNA were then constructed according to their devi- 
ations from the structure of the mature 30S subunit 
(Figure 3K-0). Structural difference can be directly 
identified from these maps. First, the conformational dif- 
ference is dominated by a relatively rigid rotational 
movement of the head domain, which changes the inter- 
domain orientation between the head and the body 
domains. Especially, one class has a nearly 60° rotated 
head domain (Figure 3E, J and O). This rotated structure, 
derived from nearly one-third of all the particles 
(Supplementary Table S2), is in fact very similar to one 
of the Group II in vitro assembly intermediates discovered 
in a time-resolved electron microscopy study (11), which 
was shown to miss nearly all the 3'-domain proteins. In 
addition to the rotation, in two classes (Figure 3K and M), 
the channel between the head and the body domains is 
closed up, resulting in a narrow down of the mRNA 
entrance. Second, four of the five maps show very incom- 
plete, fragmented densities for helix 44 of the 16S rRNA, 
except for one group (Figure 3A), which is close to the 
conformation of a mature 30S subunit and also has a less 
rotated head domain. Along with the conformational dif- 
ference at helix 44, the decoding center is also sharply 



different among the five maps (Supplementary 
Figure S4). Third, in agreement with the QMS data, 
these structures differ in protein composition, as 
exemplified by S2 and S3 (Figure 3F-J). In fact, none of 
the structures has a full occupancy for both factors, and 
interestingly the occupancy of S2 has no correlation with 
the occupancy of S3 (Supplementary Table S2). This ob- 
servation appears to align well with the in vitro assembly 
data showing that S2 and S3 could bind in independent 
order and the prior binding of S2 ahead of S3 leads to 
kinetically trapped intermediates (11). 

In summary, as seen in the temperature maps, the head 
domain of the 16S rRNA is highly mobile in the immature 
30S subunits. It is known that the motion between the 
head domain and the body domain is intrinsic and is 
believed to be required for the dynamic interaction with 
translational components (51). However, the head domain 
rotation observed in our structures is in a much larger 
scale, suggesting that hypo-level of proteins in the 
3'-domain increases its flexibility. This structural observa- 
tion demonstrates that the in vivo intermediates from the 
ArimM strain vary not only in protein composition but 
also in rRNA conformation. 

Structural characterization of the ArimM immature 30S 
subunits bound with RimM 

Next, we examined the binding preference of RimM to the 
immature and mature 30S subunits by pelleting assay. 
While RimM shows almost no binding to the mature 
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30S subimit, it indeed binds to the immature subunit, with 
a low affinity (Figure 4). In contrast, both RfbA and 
RsgA show a considerably higher affinity to the mature 
30S subunit containing the 16S rRNA than RimM does, 
and especially, RsgA displays a strong preference to the 
mature 30S subunit (25). Therefore, similar to our QMS 
data, the binding preferences of these factors also suggest 
that RimM acts, prior to RbfA and RsgA, in the in vivo 
assembly pathway. 

We then sought to explore possible structural changes 
of the immature 30S subunits upon RimM binding. Using 
the same 2D and 3D image classification techniques, we 
found that the addition of RimM to the immature 30S 
subunits seems to stabilize the 30S head domain. At the 
2D level, class averages of particles from the RimM- 
treated sample still show smeared densities in the head 
region, but the total fraction of the particles with an 
unstable head domain is significantly smaller (Supplemen- 
tary Figure S2). About 18% of particles from the un- 
treated sample display apparent instability in the head 
domain, whereas the percentage in the treated sample is 
decreased to 11%. 

At the 3D level, similarly, we classified the 
RimM-treated data into five groups, and these cryo-EM 
maps (at 15-19 A resolution) also differ in conformation 
and protein composition (Figure 5). First, the head 
domain rotation is apparently in a much less scale, as 
seen in the temperature maps (Figure 5K-0 and Supple- 
mentary Figure S5). Second, regions, such as the long 
helix 44 and the decoding center still display a large 
amount of variation (Supplementary Figure S4), 
implying the final accommodation of helix 44 is 
probably a later event, not related to RimM binding. 
Third, as expected, the occupancies of S2 and S3 are 
both very low, but surprisingly, the levels of S2 and S3 
seem to be even lower than the untreated sample (Supple- 
mentary Table S2). This finding suggests that RimM sta- 
bilizes the immature 30S subunits in a conformation that 
disfavors S2 and S3 binding, implying that S2 and S3 
binding might be later events in the assembly pathway. 

Therefore, our structural analysis of the cryo-EM 
images from the RimM-treated sample shows that in 
addition to the role in ribosomal protein assembly, 




Figure 4. RimM preferentially binds to the immature 30S subunits 
from the ArimM strain. Immature 30S subunits from the ArimM 
strain and mature 30S subunits from the 70S ribosomes were incubated 
with or without a 30-fold excess of RimM. The mixtures were pelleted 
by centrifugation. The pellets (P) and the supernatants (S) were 
separated and resolved by SDS-PAGE. RimM alone was centrifuged 
as a control. The asterisk denotes the weak binding of RimM to the 
immature 30S subunits. 



RimM also has a role in stabilizing the rRNA tertiary 
structure in the 3'-domain. 

Binding position of RimM on the 30S subunit 

The pelleting assay indicates that the affinity of RimM to 
the immature 30S subunits is very low (Figure 4). This 
apparently sets an obstacle for us to analyze contact 
sites of RimM on the 30S subunit in detail. Fortunately, 
RimM binds to S19 in vitro (20,21) and could co-crystalize 
with S19 (PDB ID: 3A1P). Therefore, the binding position 
of RimM could be deduced using S19 as a reference 
(Figure 6), given that RimM does not change its 
contacts in the context of the immature 30S subunit. In 
fact, the cryo-EM maps of the RimM-treated immature 
30S subunits, although prepared with a 40-fold excess of 
RimM, have limited densities at locations expected to 
have RimM bound when the maps are displayed at a 3a 
level (Figure 5). Densities corresponding to RimM begin 
to appear in lower threshold (Supplementary Figure S6). 
Nevertheless, we could compare the average densities stat- 
istically, within a 3D binary mask generated from the 
aligned RimM crystal structure. Consistently, the densities 
at RimM-bound region in the cryo-EM maps from the 
RimM-treated sample are significantly higher than those 
from the untreated sample (Supplementary Figure S6). 
This analysis, albeit rather preliminary, proves that 
RimM is present in these cryo-EM maps. 

The structure of RimM is composed of two P-barrels 
containing domains (21). While the C-terminal domain is 
shown to interact with SI 9, the N-terminal domain closely 
resembles a tRNA-binding domain of EF-Tu (21), sug- 
gesting the ability of RimM to bind to the 16S rRNA. 
Consistently, alignment of the structure of the 
RimM-S19 complex immediately places the N-terminal 
domain of RimM at the junction of several helices, such 
as h29, h30 and h42 (Figure 6). Since, prior to RimM 
binding, the 30S assembly is in a stage with very limited 
3'-domain protein incorporated (Figure 2C), the binding 
of RimM at this multi-helices interface might stabilize the 
rRNA conformation globally and therefore allows a faster 
and/or more stable binding of 3'-domain proteins. 



DISCUSSION 

The role of RimM in the assembly of the 30S subunit 

It is known that disturbance to protein translation might 
affect the subunit assembly in an indirect way, due to a 
shortage in the ribosomal protein production. 
Consequently, the impaired subunit assembly in E. coli 
strains with genes for assembly factors deleted stems not 
only from the defective assembly process itself but also 
from a reduced supply of ribosomal proteins. Never- 
theless, in this study, the composition of the in vivo inter- 
mediates from the ArimM and ArbfAArsgA strains clearly 
displays a non-uniform level of ribosome proteins, with 
mostly the 3'-domain proteins significantly underrepre- 
sented (Figure 2), suggesting that the secondary effect 
caused by impaired translation in these strains is negligible 
and does not over-shadow the assembly defects. 
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Figure 5. Overview of the five cryo-EM structures of the immature 30S subunits treated with RimM. The five density maps (A-E or F-J, respect- 
ively) are displayed in transparent surface representation, superimposed with flexibly fitted crystal structures in cartoon representation. For each map, 
both the inter-subunit view (A-E) and the solvent view (F-J) are displayed. The 16S rRNA, S2, S3 and the rest 30S subunit proteins are painted in 
blue, green, red and purple, respectively. Deviations of the 16S rRNA backbones in the fitted model from that of the mature 30S subunit are colored 
as indicated by the scale to form the temperature maps (K-O). 



The role of RimM uncovered in the present work in fact 
serves as a perfect illustration to the proposed general 
function of assembly factors, i.e. to subvert possible 
kinetic traps caused by mis-folded rRNA or rate-limiting 
binding of certain proteins, during the in vivo assembly 
process (11,13,49). Previous kinetic data showed that the 
binding of 3'-domain proteins is not obligatory to the 
prebinding of S7 (13), while in contrast, prebinding of S7 
and S19 together dramatically accelerates the binding of the 
rest 3' -domain proteins (13), indicating the presence of a 
rate-limiting S7-independent assembly pathway for S19 
(13,52). Consistently, our data show that in the ArimM 
sample, S19 is among the most underrepresented proteins, 
whereas in the ArbfAArsgA sample, the level of S19 is dra- 
matically increased (Figure 2). Thus, it is likely that the role 
of RimM in vivo is to counteract the kinetic trap caused by 
slow binding of SI 9. In support of this view, both RimM 
and S19 bind to a multi-helices junction of the 16S 
3'-domain (Figure 6), highlighting their potential effect on 
the global stabilization of the 3'-domain. 

Functional interplay of assembly factors 

In addition to the two sets of intermediates described in 
this study, another set of in vivo intermediates, isolated 



from a ArsgA strain, was also quantitatively analyzed 
(50). The comparison of these quantitative data from dif- 
ferent genetic background would enable us to identify the 
temporal relationship of assembly factors. 

First, assembly intermediates isolated from a ArsgA 
strain only have a very small subset of tertiary binding 
proteins (S21, S2 and S3) largely underrepresented (50), 
suggesting that RsgA acts at a very late stage when most 
of the components are already in place. Second, inter- 
mediates from the ArimM strain show underrepresented 
levels for all secondary and tertiary binding proteins 
from the 3'-domain. Interestingly, the ArimM intermedi- 
ates are to some extent similar in protein composition to a 
previously identified in vivo 21S intermediates (44,53), but 
differ from the in vitro RI intermediates (54). The close 
resemblance of the ArimM intermediates to the naturally 
populated in vivo 21S intermediates suggests that they 
represent an early stage during the assembly of the 
3'-domain, likely the entry stage. Last, in contrast, the 
intermediates from the ArbfAArsgA strain, with only a 
subset of 3'-domain proteins largely underrepresented, 
do not resemble any of the known intermediates, 
indicating that they represent a novel set of intermediates 
roughly in-between. 
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Figure 6. Mechanistic model of the RimM function in the in vivo assembly of the 3'-domain. (A) The head domain of the 30S subunit viewed from 
the inter-subunit side. (B) Same as (A), with a 70° rotation around y-axis. The h33b, h31 and the rest of the 16S rRNA are painted in cyan, wheat 
and blue, respectively. The C-terminal domain of RimM (CTD), N-terminal domain of RimM (NTD), S7, S13, S19 and S14 are painted in magenta, 
orange, red, yellow, green and purple, respectively. (C) RimM, RbfA and RsgA act at different checkpoints during the in vivo assembly. The 
deficiency of assembly factors diverts the assembly into less efficient branches (colored dash lines) and causes accumulation of a set of closely related 
intermediates (colored boxes). The ribosomal protein levels in the three sets of in vivo intermediates are displayed in the gray scale. The data of the 
intermediates from a ArsgA strain is from a previous study (50). The large conformational differences among the three sets of in vivo intermediate 
were also shown in cartoon: the ArimM one (red) with a dramatically rotated head domain and a disordered helix 44; the ArbfAArsgA one (green) 
with a disordered helix 44 only; the ArsgA one (blue) with a well-resolved helix 44. 



Therefore, the protein spectra of the above three sets of 
intermediates clearly suggest an order for the actions of 
these factors (Figure 6C), which is consistent with 
previous genetic and biochemical data (19,25,55). It 
must be noted that it is difficult to unambiguously 
timestamp other biogenesis factors in the assembly 
pathway due to the lack of biochemical data, although 
genetic data have suggested both functional redundancy 
and hierarchy for some assembly factors [reviewed in (12)]. 
Nevertheless, if we view the in vivo assembly as a 
multi-branched process, the seemingly function redun- 
dancy among assembly factors is merely a sign of altered 
contribution of different, inter-connected assembly 
pathways. As shown in Figure 6C, the late-stage 
assembly in vivo starts with an in vivo 21S intermediate 
(44,53) and proceeds along a highly efficient pathway in 
the presence of all assembly factors. Assembly factors 
come in play at different time points to assist certain kin- 
etically disfavored assembly events. The disruption of a 
factor or a combination of factors would avert the 
assembly to less efficient branches and cause accumulation 
of a certain category of kinetically trapped intermediates. 
Consistent with this view, most of the E. coli genes for 
assembly factors are not essential. The remaining 
question is whether these kinetically trapped intermediates 
from various genetic background with different factors 
disrupted truly represent genuine snapshots of the 
assembly process in the normal condition, or different 



'dead-end' products that are otherwise elusive in the 
normal condition. 

The in vivo assembly of the 30S 3' -domain also follows 
parallel pathways 

Early chemical probing of the 16S rRNA conformation 
(56), as well as recent kinetic measurement of the protein 
binding (9,11), showed that the 3'-domain assembly is the 
latest event during the in vitro assembly of the 30S subunit, 
coincident with the 5'- to 3'-transcription order. On the 
other hand, accumulating evidences (9—11,56) suggest that 
a major general feature of the in vitro assembly of the 30S 
subunit is that the process proceeds along multiple routes. 

In this study, we isolated the in vivo assembly intermedi- 
ates from two genetically modified E. coli strains. The 
most remarkable feature in the protein spectra of these 
two sets of intermediates is that they both severely lack 
3'-domain proteins, suggesting that the maturation of the 
3'-domain is also a rate-limiting process in vivo. These 
quantitative data also indicate that the intermediates 
from both strains are very heterogeneous in ribosomal 
protein composition, which means they do not represent 
a single populated assembly intermediate state, but rather 
a collection of multiple-related intermediates with more 
than one metastable state enriched. These differently 
prepared intermediates, although with a recognizable 
temporal relationship, cannot be easily reconciled by a 
single continuous assembly pathway. 
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With respect to the occupancies of individual proteins, 
there are a number of exceptions to the well-accepted 
Nomura assembly map (5). To name a few, first, an un- 
expected observation of our QMS data is that two primary 
proteins (S4 and S7) in the ArbfAArsgA sample show 
decreased levels compared with the ArimM sample. S4 is 
thermodynamically required for subsequent binding of 
S16, S12 and S5 to the 5'-domain (5), and more import- 
antly, S4 was shown to have a global stabilization effect 
on the 5'-domain (57). S7 is the only primary protein in the 
3'-head domain and directs the following binding of S9, 
S13 and S19 (5,48). However, in the ArbfAArsgA sample, 
the occupancy of S4 and S7 (Figure 2) is even lower than 
its follower proteins. Similarly, the level of S7 was also 
reported to be lower than its follower S9, SI 3 and S19 
in the ArsgA sample (50). Second, levels of S10 and S14 
are lower than their follower tertiary proteins S2 and S3 in 
the ArimM sample, suggesting that S3 could bind inde- 
pendent of S10. Third, the level of S2 is lower than S3 in 
the ArbfAArsgA sample, although S3 binding is thermo- 
dynamically dependent on the prior binding of S2 in the 
Nomura map. 

The Nomura map was derived by single protein 
omission reconstitution experiments with fully processed 
16S rRNA under equilibrium conditions and therefore 
does not necessarily reflect the true order of serials of 
binding events during assembly (52). In addition to our 
QMS data, deviations from the Nomura map have already 
been observed from both in vivo and in vitro studies. 
Previous genetic data show that SI 5, a primary protein 
in the central domain, is dispensable for the 30S 
assembly in vivo (58). Furthermore, in vitro kinetic data 
from Williamson group based on PC-QMS (11,13) or 
fluorescence triple correlation spectroscopy (52) indicate 
that S9 and S19 could bind independent of S7 (13,52), 
and S2 could bind independent of S3 (11). All these data 
suggest that there are hidden assembly pathways that 
could not be directly inferred from the Nomura map. 

Thus, the seemingly discrepancy between our QMS data 
and the Nomura map could be easily reconciled if we view 
the in vivo assembly process as a highly branched network. 
In the presence of not fully processed 1 7S rRNA and the 
absence of certain assembly factors, the 30S assembly 
in vivo takes alternative, kinetically inefficient, pathways 
that are not predicted by the Nomura map. Therefore, the 
assembly intermediates isolated from these different 
assembly factor-deficient mutants might represent inter- 
mediates kinetically trapped in various parallel branches 
in the assembly network. 

In summary, although the number of possible assembly 
pathways in vivo is limited by the co-transcriptional nature 
and the presence of assembly factors in the in vivo condi- 
tion, our data provide additional strong evidence to the 
emerging idea that the in vivo assembly also proceeds 
along parallel pathways in a certain degree (54,58). 

The maturation of the 16S rRNA 3' -domain in vivo is 
highly coupled with protein assembly 

Our structural data reveal that the in vivo assembly inter- 
mediates differ largely in rRNA conformation. Through 



the integration of structural (26,50) and QMS data [the 
present work and (50)], we could draw a conclusion that 
the 3'-domain of the 16S rRNA maturates in a progressive 
manner in vivo, paralleling with the ribosomal protein 
assembly. 

First, assembly intermediates from ArimM cells show 
dramatic conformational differences in the position of the 
3'-domain (Figure 3 and Supplementary Figure S2). In 
contrast, cryo-EM structures of intermediates from 
ArsgA cells (50) also vary in the 3'-domain position, but 
with a much smaller scale. Second, the long helix 44 of 
the 3'-minor domain is highly flexible and is almost invis- 
ible in the structures of intermediates from ArimM cells 
(Figures 3 and 5). However, cryo-EM structures of 30S 
intermediates from ArsgA cells (50), which also contain 
17S rRNA, display well-resolved densities for helix 44, 
except for the upper decoding center region. This 
suggests that helix 44 adapts its mature conformation 
only at a very late stage. In fact, hydroxyl radical 
probing data (10,59) already showed that the full accom- 
modation of helix 44 is a late event even when the experi- 
ments were performed with the 16S rRNA. Third, further 
downstream is the cryo-EM structure of the 30S-RsgA 
complex, which displays an almost identical conformation 
to the mature 30S subunit (26). 

Based on the above structural comparisons, the matur- 
ation of the 3'-domain of the 16S rRNA follows the tran- 
scription order in a progressive manner, first the 3'-head 
domain, next the 3'-minor domain (Figure 6C), and more 
importantly, the conformational maturation is coupled 
with the gradually increased protein level (Figure 6C). 
Therefore, our data have revealed another common char- 
acteristic shared by the in vivo and the in vitro processes, 
i.e. a high cooperativity between protein binding and 
rRNA folding (10,11,60,61). 
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